<!DOCTYPE html>
<html lang="" xml:lang="">
  <head>
    <title>Scraping top 250 movies on IMDB</title>
    <meta charset="utf-8" />
    <meta name="author" content="datasciencebox.org" />
    <script src="libs/header-attrs/header-attrs.js"></script>
    <link href="libs/font-awesome/css/all.css" rel="stylesheet" />
    <link href="libs/font-awesome/css/v4-shims.css" rel="stylesheet" />
    <link href="libs/panelset/panelset.css" rel="stylesheet" />
    <script src="libs/panelset/panelset.js"></script>
    <script src="libs/htmlwidgets/htmlwidgets.js"></script>
    <link href="libs/datatables-css/datatables-crosstalk.css" rel="stylesheet" />
    <script src="libs/datatables-binding/datatables.js"></script>
    <script src="libs/jquery/jquery-3.6.0.min.js"></script>
    <link href="libs/dt-core/css/jquery.dataTables.min.css" rel="stylesheet" />
    <link href="libs/dt-core/css/jquery.dataTables.extra.css" rel="stylesheet" />
    <script src="libs/dt-core/js/jquery.dataTables.min.js"></script>
    <link href="libs/crosstalk/css/crosstalk.min.css" rel="stylesheet" />
    <script src="libs/crosstalk/js/crosstalk.min.js"></script>
    <link rel="stylesheet" href="../xaringan-themer.css" type="text/css" />
    <link rel="stylesheet" href="../slides.css" type="text/css" />
  </head>
  <body>
    <textarea id="source">
class: center, middle, inverse, title-slide

.title[
# Scraping top 250 movies on IMDB
]
.subtitle[
## <br><br> Data Science in a Box
]
.author[
### <a href="https://datasciencebox.org/">datasciencebox.org</a>
]

---





layout: true
  
&lt;div class="my-footer"&gt;
&lt;span&gt;
&lt;a href="https://datasciencebox.org" target="_blank"&gt;datasciencebox.org&lt;/a&gt;
&lt;/span&gt;
&lt;/div&gt; 

---



class: middle

# Top 250 movies on IMDB

---

## Top 250 movies on IMDB

Take a look at the source code, look for the tag `table` tag:
&lt;br&gt;
http://www.imdb.com/chart/top

.pull-left[
&lt;img src="img/imdb-top-250.png" width="100%" style="display: block; margin: auto;" /&gt;
]
.pull-right[
&lt;img src="img/imdb-top-250-source.png" width="94%" style="display: block; margin: auto;" /&gt;
]


---

## First check if you're allowed!


```r
library(robotstxt)
paths_allowed("http://www.imdb.com")
```

```
## [1] TRUE
```

vs. e.g.


```r
paths_allowed("http://www.facebook.com")
```

```
## [1] FALSE
```

---

## Plan

&lt;img src="img/plan.png" width="90%" style="display: block; margin: auto;" /&gt;

---

## Plan

1. Read the whole page

2. Scrape movie titles and save as `titles` 

3. Scrape years movies were made in and save as `years`

4. Scrape IMDB ratings and save as `ratings`

5. Create a data frame called `imdb_top_250` with variables `title`, `year`, and `rating`

---

class: middle

# Step 1. Read the whole page

---

## Read the whole page


```r
page &lt;- read_html("https://www.imdb.com/chart/top/")
page
```

```
## {html_document}
## &lt;html xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml"&gt;
## [1] &lt;head&gt;\n&lt;meta http-equiv="Content-Type" content="text/html ...
## [2] &lt;body id="styleguide-v2" class="fixed"&gt;\n            &lt;img  ...
```

---

## A webpage in R

- Result is a list with 2 elements


```r
typeof(page)
```

```
## [1] "list"
```

--

- that we need to convert to something more familiar, like a data frame....


```r
class(page)
```

```
## [1] "xml_document" "xml_node"
```

---

class: middle

# Step 2. Scrape movie titles and save as `titles` 

---

## Scrape movie titles

&lt;img src="img/titles.png" width="70%" style="display: block; margin: auto;" /&gt;

---

## Scrape the nodes

.pull-left[

```r
page %&gt;%
  html_nodes(".titleColumn a")
```

```
## {xml_nodeset (250)}
##  [1] &lt;a href="/title/tt0111161/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
##  [2] &lt;a href="/title/tt0068646/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
##  [3] &lt;a href="/title/tt0468569/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
##  [4] &lt;a href="/title/tt0071562/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
##  [5] &lt;a href="/title/tt0050083/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
##  [6] &lt;a href="/title/tt0108052/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
##  [7] &lt;a href="/title/tt0167260/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
##  [8] &lt;a href="/title/tt0110912/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
##  [9] &lt;a href="/title/tt0120737/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
## [10] &lt;a href="/title/tt0060196/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
## [11] &lt;a href="/title/tt0109830/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
## [12] &lt;a href="/title/tt0137523/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
## [13] &lt;a href="/title/tt1375666/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
## [14] &lt;a href="/title/tt0167261/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
## [15] &lt;a href="/title/tt0080684/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
## [16] &lt;a href="/title/tt0133093/?pf_rd_m=A2FGELUUNOQJNL&amp;amp;pf_ ...
...
```
]
.pull-right[
&lt;img src="img/titles.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

## Extract the text from the nodes

.pull-left[

```r
page %&gt;%
  html_nodes(".titleColumn a") %&gt;%
  html_text()
```

```
##   [1] "The Shawshank Redemption"                                            
##   [2] "The Godfather"                                                       
##   [3] "The Dark Knight"                                                     
##   [4] "The Godfather Part II"                                               
##   [5] "12 Angry Men"                                                        
##   [6] "Schindler's List"                                                    
##   [7] "The Lord of the Rings: The Return of the King"                       
##   [8] "Pulp Fiction"                                                        
##   [9] "The Lord of the Rings: The Fellowship of the Ring"                   
##  [10] "The Good, the Bad and the Ugly"                                      
##  [11] "Forrest Gump"                                                        
##  [12] "Fight Club"                                                          
##  [13] "Inception"                                                           
##  [14] "The Lord of the Rings: The Two Towers"                               
##  [15] "Star Wars: Episode V - The Empire Strikes Back"                      
##  [16] "The Matrix"                                                          
...
```
]
.pull-right[
&lt;img src="img/titles.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

## Save as `titles`

.pull-left[

```r
titles &lt;- page %&gt;%
  html_nodes(".titleColumn a") %&gt;%
  html_text()

titles
```

```
##   [1] "The Shawshank Redemption"                                            
##   [2] "The Godfather"                                                       
##   [3] "The Dark Knight"                                                     
##   [4] "The Godfather Part II"                                               
##   [5] "12 Angry Men"                                                        
##   [6] "Schindler's List"                                                    
##   [7] "The Lord of the Rings: The Return of the King"                       
##   [8] "Pulp Fiction"                                                        
##   [9] "The Lord of the Rings: The Fellowship of the Ring"                   
##  [10] "The Good, the Bad and the Ugly"                                      
##  [11] "Forrest Gump"                                                        
##  [12] "Fight Club"                                                          
##  [13] "Inception"                                                           
##  [14] "The Lord of the Rings: The Two Towers"                               
...
```
]
.pull-right[
&lt;img src="img/titles.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

class: middle

# Step 3. Scrape year movies were made and save as `years`

---

## Scrape years movies were made in

&lt;img src="img/years.png" width="70%" style="display: block; margin: auto;" /&gt;

---

## Scrape the nodes

.pull-left[

```r
page %&gt;%
  html_nodes(".secondaryInfo")
```

```
## {xml_nodeset (250)}
##  [1] &lt;span class="secondaryInfo"&gt;(1994)&lt;/span&gt;
##  [2] &lt;span class="secondaryInfo"&gt;(1972)&lt;/span&gt;
##  [3] &lt;span class="secondaryInfo"&gt;(2008)&lt;/span&gt;
##  [4] &lt;span class="secondaryInfo"&gt;(1974)&lt;/span&gt;
##  [5] &lt;span class="secondaryInfo"&gt;(1957)&lt;/span&gt;
##  [6] &lt;span class="secondaryInfo"&gt;(1993)&lt;/span&gt;
##  [7] &lt;span class="secondaryInfo"&gt;(2003)&lt;/span&gt;
##  [8] &lt;span class="secondaryInfo"&gt;(1994)&lt;/span&gt;
##  [9] &lt;span class="secondaryInfo"&gt;(2001)&lt;/span&gt;
## [10] &lt;span class="secondaryInfo"&gt;(1966)&lt;/span&gt;
## [11] &lt;span class="secondaryInfo"&gt;(1994)&lt;/span&gt;
## [12] &lt;span class="secondaryInfo"&gt;(1999)&lt;/span&gt;
## [13] &lt;span class="secondaryInfo"&gt;(2010)&lt;/span&gt;
## [14] &lt;span class="secondaryInfo"&gt;(2002)&lt;/span&gt;
## [15] &lt;span class="secondaryInfo"&gt;(1980)&lt;/span&gt;
## [16] &lt;span class="secondaryInfo"&gt;(1999)&lt;/span&gt;
...
```
]
.pull-right[
&lt;img src="img/years.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

## Extract the text from the nodes

.pull-left[

```r
page %&gt;%
  html_nodes(".secondaryInfo") %&gt;%
  html_text()
```

```
##   [1] "(1994)" "(1972)" "(2008)" "(1974)" "(1957)" "(1993)"
##   [7] "(2003)" "(1994)" "(2001)" "(1966)" "(1994)" "(1999)"
##  [13] "(2010)" "(2002)" "(1980)" "(1999)" "(1990)" "(1975)"
##  [19] "(1995)" "(1954)" "(1946)" "(1991)" "(2002)" "(1998)"
##  [25] "(1997)" "(1999)" "(2014)" "(1977)" "(1991)" "(1985)"
##  [31] "(2001)" "(1960)" "(2002)" "(1994)" "(2019)" "(1994)"
##  [37] "(2000)" "(1998)" "(1995)" "(2006)" "(2006)" "(1942)"
##  [43] "(2022)" "(2014)" "(2011)" "(1936)" "(1962)" "(1968)"
##  [49] "(1988)" "(1954)" "(1979)" "(1931)" "(1988)" "(2000)"
##  [55] "(1979)" "(1981)" "(2012)" "(2008)" "(2006)" "(1950)"
##  [61] "(1957)" "(1980)" "(1940)" "(1957)" "(2018)" "(1986)"
##  [67] "(1999)" "(1964)" "(2012)" "(2018)" "(2019)" "(2003)"
##  [73] "(1995)" "(1984)" "(1995)" "(2017)" "(1981)" "(2009)"
##  [79] "(1997)" "(2019)" "(1984)" "(1997)" "(2000)" "(2010)"
##  [85] "(2016)" "(1952)" "(2009)" "(1983)" "(1968)" "(2004)"
##  [91] "(1992)" "(1963)" "(2018)" "(1941)" "(1962)" "(2012)"
...
```
]
.pull-right[
&lt;img src="img/years.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

## Clean up the text

We need to go from `"(1994)"` to `1994`:

- Remove `(` and `)`: string manipulation

- Convert to numeric: `as.numeric()`

---

## stringr

.pull-left-wide[
- **stringr** provides a cohesive set of functions designed to make working with strings as easy as possible
- Functions in stringr start with `str_*()`, e.g.
  - `str_remove()` to remove a pattern from a string
  
  ```r
   str_remove(string = "jello", pattern = "el")
  ```
  
  ```
  ## [1] "jlo"
  ```
  - `str_replace()` to replace a pattern with another
  
  ```r
  str_replace(string = "jello", pattern = "j", replacement = "h")
  ```
  
  ```
  ## [1] "hello"
  ```
]
.pull-right-narrow[
&lt;img src="img/stringr.png" width="100%" style="display: block; margin: auto auto auto 0;" /&gt;
]

---

## Clean up the text


```r
page %&gt;%
  html_nodes(".secondaryInfo") %&gt;%
  html_text() %&gt;%
  str_remove("\\(") # remove (
```

```
##   [1] "1994)" "1972)" "2008)" "1974)" "1957)" "1993)" "2003)"
##   [8] "1994)" "2001)" "1966)" "1994)" "1999)" "2010)" "2002)"
##  [15] "1980)" "1999)" "1990)" "1975)" "1995)" "1954)" "1946)"
##  [22] "1991)" "2002)" "1998)" "1997)" "1999)" "2014)" "1977)"
##  [29] "1991)" "1985)" "2001)" "1960)" "2002)" "1994)" "2019)"
##  [36] "1994)" "2000)" "1998)" "1995)" "2006)" "2006)" "1942)"
##  [43] "2022)" "2014)" "2011)" "1936)" "1962)" "1968)" "1988)"
##  [50] "1954)" "1979)" "1931)" "1988)" "2000)" "1979)" "1981)"
##  [57] "2012)" "2008)" "2006)" "1950)" "1957)" "1980)" "1940)"
##  [64] "1957)" "2018)" "1986)" "1999)" "1964)" "2012)" "2018)"
##  [71] "2019)" "2003)" "1995)" "1984)" "1995)" "2017)" "1981)"
##  [78] "2009)" "1997)" "2019)" "1984)" "1997)" "2000)" "2010)"
##  [85] "2016)" "1952)" "2009)" "1983)" "1968)" "2004)" "1992)"
##  [92] "1963)" "2018)" "1941)" "1962)" "2012)" "1959)" "1931)"
##  [99] "1958)" "2001)" "1971)" "1985)" "1987)" "1944)" "1960)"
...
```

---

## Clean up the text


```r
page %&gt;%
  html_nodes(".secondaryInfo") %&gt;%
  html_text() %&gt;%
  str_remove("\\(") %&gt;% # remove (
  str_remove("\\)") # remove )
```

```
##   [1] "1994" "1972" "2008" "1974" "1957" "1993" "2003" "1994"
##   [9] "2001" "1966" "1994" "1999" "2010" "2002" "1980" "1999"
##  [17] "1990" "1975" "1995" "1954" "1946" "1991" "2002" "1998"
##  [25] "1997" "1999" "2014" "1977" "1991" "1985" "2001" "1960"
##  [33] "2002" "1994" "2019" "1994" "2000" "1998" "1995" "2006"
##  [41] "2006" "1942" "2022" "2014" "2011" "1936" "1962" "1968"
##  [49] "1988" "1954" "1979" "1931" "1988" "2000" "1979" "1981"
##  [57] "2012" "2008" "2006" "1950" "1957" "1980" "1940" "1957"
##  [65] "2018" "1986" "1999" "1964" "2012" "2018" "2019" "2003"
##  [73] "1995" "1984" "1995" "2017" "1981" "2009" "1997" "2019"
##  [81] "1984" "1997" "2000" "2010" "2016" "1952" "2009" "1983"
##  [89] "1968" "2004" "1992" "1963" "2018" "1941" "1962" "2012"
##  [97] "1959" "1931" "1958" "2001" "1971" "1985" "1987" "1944"
## [105] "1960" "1983" "1952" "1973" "1962" "1976" "1997" "2009"
...
```

---

## Convert to numeric


```r
page %&gt;%
  html_nodes(".secondaryInfo") %&gt;%
  html_text() %&gt;%
  str_remove("\\(") %&gt;% # remove (
  str_remove("\\)") %&gt;% # remove )
  as.numeric()
```

```
##   [1] 1994 1972 2008 1974 1957 1993 2003 1994 2001 1966 1994 1999
##  [13] 2010 2002 1980 1999 1990 1975 1995 1954 1946 1991 2002 1998
##  [25] 1997 1999 2014 1977 1991 1985 2001 1960 2002 1994 2019 1994
##  [37] 2000 1998 1995 2006 2006 1942 2022 2014 2011 1936 1962 1968
##  [49] 1988 1954 1979 1931 1988 2000 1979 1981 2012 2008 2006 1950
##  [61] 1957 1980 1940 1957 2018 1986 1999 1964 2012 2018 2019 2003
##  [73] 1995 1984 1995 2017 1981 2009 1997 2019 1984 1997 2000 2010
##  [85] 2016 1952 2009 1983 1968 2004 1992 1963 2018 1941 1962 2012
##  [97] 1959 1931 1958 2001 1971 1985 1987 1944 1960 1983 1952 1973
## [109] 1962 1976 1997 2009 1995 2020 1927 2011 2000 1988 2010 1989
## [121] 1948 2021 2019 2007 2004 1965 2005 2016 1921 1959 2022 2020
## [133] 1950 2018 2013 1961 1992 1995 1985 2006 2007 1999 2001 1975
## [145] 1998 1961 1948 2010 1950 1963 1993 2003 2007 2003 1980 1980
...
```

---

## Save as `years`

.pull-left[

```r
years &lt;- page %&gt;%
  html_nodes(".secondaryInfo") %&gt;%
  html_text() %&gt;%
  str_remove("\\(") %&gt;% # remove (
  str_remove("\\)") %&gt;% # remove )
  as.numeric()

years
```

```
##   [1] 1994 1972 2008 1974 1957 1993 2003 1994 2001 1966 1994 1999
##  [13] 2010 2002 1980 1999 1990 1975 1995 1954 1946 1991 2002 1998
##  [25] 1997 1999 2014 1977 1991 1985 2001 1960 2002 1994 2019 1994
##  [37] 2000 1998 1995 2006 2006 1942 2022 2014 2011 1936 1962 1968
##  [49] 1988 1954 1979 1931 1988 2000 1979 1981 2012 2008 2006 1950
##  [61] 1957 1980 1940 1957 2018 1986 1999 1964 2012 2018 2019 2003
##  [73] 1995 1984 1995 2017 1981 2009 1997 2019 1984 1997 2000 2010
##  [85] 2016 1952 2009 1983 1968 2004 1992 1963 2018 1941 1962 2012
##  [97] 1959 1931 1958 2001 1971 1985 1987 1944 1960 1983 1952 1973
## [109] 1962 1976 1997 2009 1995 2020 1927 2011 2000 1988 2010 1989
## [121] 1948 2021 2019 2007 2004 1965 2005 2016 1921 1959 2022 2020
...
```
]
.pull-right[
&lt;img src="img/years.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

class: middle

# Step 4. Scrape IMDB ratings and save as `ratings`

---

## Scrape IMDB ratings

&lt;img src="img/ratings.png" width="70%" style="display: block; margin: auto;" /&gt;

---

## Scrape the nodes

.pull-left[

```r
page %&gt;%
  html_nodes("strong")
```

```
## {xml_nodeset (250)}
##  [1] &lt;strong title="9.2 based on 2,598,663 user ratings"&gt;9.2&lt;/ ...
##  [2] &lt;strong title="9.2 based on 1,794,051 user ratings"&gt;9.2&lt;/ ...
##  [3] &lt;strong title="9.0 based on 2,569,907 user ratings"&gt;9.0&lt;/ ...
##  [4] &lt;strong title="9.0 based on 1,236,627 user ratings"&gt;9.0&lt;/ ...
##  [5] &lt;strong title="8.9 based on 767,804 user ratings"&gt;8.9&lt;/st ...
##  [6] &lt;strong title="8.9 based on 1,321,819 user ratings"&gt;8.9&lt;/ ...
##  [7] &lt;strong title="8.9 based on 1,784,964 user ratings"&gt;8.9&lt;/ ...
##  [8] &lt;strong title="8.9 based on 1,992,218 user ratings"&gt;8.9&lt;/ ...
##  [9] &lt;strong title="8.8 based on 1,806,022 user ratings"&gt;8.8&lt;/ ...
## [10] &lt;strong title="8.8 based on 745,449 user ratings"&gt;8.8&lt;/st ...
## [11] &lt;strong title="8.8 based on 2,007,365 user ratings"&gt;8.8&lt;/ ...
## [12] &lt;strong title="8.8 based on 2,046,596 user ratings"&gt;8.8&lt;/ ...
## [13] &lt;strong title="8.7 based on 2,280,290 user ratings"&gt;8.7&lt;/ ...
## [14] &lt;strong title="8.7 based on 1,612,066 user ratings"&gt;8.7&lt;/ ...
## [15] &lt;strong title="8.7 based on 1,257,435 user ratings"&gt;8.7&lt;/ ...
## [16] &lt;strong title="8.7 based on 1,866,072 user ratings"&gt;8.7&lt;/ ...
...
```
]
.pull-right[
&lt;img src="img/ratings.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

## Extract the text from the nodes

.pull-left[

```r
page %&gt;%
  html_nodes("strong") %&gt;%
  html_text()
```

```
##   [1] "9.2" "9.2" "9.0" "9.0" "8.9" "8.9" "8.9" "8.9" "8.8" "8.8"
##  [11] "8.8" "8.8" "8.7" "8.7" "8.7" "8.7" "8.7" "8.6" "8.6" "8.6"
##  [21] "8.6" "8.6" "8.6" "8.6" "8.6" "8.6" "8.6" "8.6" "8.5" "8.5"
##  [31] "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5" "8.5"
##  [41] "8.5" "8.5" "8.5" "8.5" "8.5" "8.4" "8.4" "8.4" "8.4" "8.4"
##  [51] "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4" "8.4"
##  [61] "8.4" "8.4" "8.4" "8.4" "8.4" "8.3" "8.3" "8.3" "8.3" "8.3"
##  [71] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3"
##  [81] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3"
##  [91] "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.3" "8.2" "8.2"
## [101] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
## [111] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
## [121] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
## [131] "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2" "8.2"
## [141] "8.2" "8.2" "8.2" "8.2" "8.2" "8.1" "8.1" "8.1" "8.1" "8.1"
## [151] "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1" "8.1"
...
```
]
.pull-right[
&lt;img src="img/ratings.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

## Convert to numeric

.pull-left[

```r
page %&gt;%
  html_nodes("strong") %&gt;%
  html_text() %&gt;%
  as.numeric()
```

```
##   [1] 9.2 9.2 9.0 9.0 8.9 8.9 8.9 8.9 8.8 8.8 8.8 8.8 8.7 8.7 8.7
##  [16] 8.7 8.7 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.5 8.5
##  [31] 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5
##  [46] 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4
##  [61] 8.4 8.4 8.4 8.4 8.4 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3
##  [76] 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3
##  [91] 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.2 8.2 8.2 8.2 8.2 8.2 8.2
## [106] 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2
## [121] 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2
## [136] 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.1 8.1 8.1 8.1 8.1
## [151] 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1
## [166] 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1
## [181] 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1
## [196] 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1
## [211] 8.1 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0
...
```
]
.pull-right[
&lt;img src="img/ratings.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

## Save as `ratings`

.pull-left[

```r
ratings &lt;- page %&gt;%
  html_nodes("strong") %&gt;%
  html_text() %&gt;%
  as.numeric()

ratings
```

```
##   [1] 9.2 9.2 9.0 9.0 8.9 8.9 8.9 8.9 8.8 8.8 8.8 8.8 8.7 8.7 8.7
##  [16] 8.7 8.7 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.6 8.5 8.5
##  [31] 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5 8.5
##  [46] 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4 8.4
##  [61] 8.4 8.4 8.4 8.4 8.4 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3
##  [76] 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3
##  [91] 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.3 8.2 8.2 8.2 8.2 8.2 8.2 8.2
## [106] 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2
## [121] 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2
## [136] 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.2 8.1 8.1 8.1 8.1 8.1
## [151] 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1
## [166] 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1 8.1
...
```
]
.pull-right[
&lt;img src="img/ratings.png" width="100%" style="display: block; margin: auto;" /&gt;
]

---

class: middle

# Step 5. Create a data frame called `imdb_top_250`

---

## Create a data frame: `imdb_top_250`


```r
imdb_top_250 &lt;- tibble(
  title = titles, 
  year = years, 
  rating = ratings
  )

imdb_top_250
```

```
## # A tibble: 250 × 3
##   title                     year rating
##   &lt;chr&gt;                    &lt;dbl&gt;  &lt;dbl&gt;
## 1 The Shawshank Redemption  1994    9.2
## 2 The Godfather             1972    9.2
## 3 The Dark Knight           2008    9  
## 4 The Godfather Part II     1974    9  
## 5 12 Angry Men              1957    8.9
## 6 Schindler's List          1993    8.9
## # … with 244 more rows
```

---

<div id="htmlwidget-8f836166d559454ecd73" style="width:100%;height:400px;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-8f836166d559454ecd73">{"x":{"filter":"none","vertical":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100","101","102","103","104","105","106","107","108","109","110","111","112","113","114","115","116","117","118","119","120","121","122","123","124","125","126","127","128","129","130","131","132","133","134","135","136","137","138","139","140","141","142","143","144","145","146","147","148","149","150","151","152","153","154","155","156","157","158","159","160","161","162","163","164","165","166","167","168","169","170","171","172","173","174","175","176","177","178","179","180","181","182","183","184","185","186","187","188","189","190","191","192","193","194","195","196","197","198","199","200","201","202","203","204","205","206","207","208","209","210","211","212","213","214","215","216","217","218","219","220","221","222","223","224","225","226","227","228","229","230","231","232","233","234","235","236","237","238","239","240","241","242","243","244","245","246","247","248","249","250"],["The Shawshank Redemption","The Godfather","The Dark Knight","The Godfather Part II","12 Angry Men","Schindler's List","The Lord of the Rings: The Return of the King","Pulp Fiction","The Lord of the Rings: The Fellowship of the Ring","The Good, the Bad and the Ugly","Forrest Gump","Fight Club","Inception","The Lord of the Rings: The Two Towers","Star Wars: Episode V - The Empire Strikes Back","The Matrix","Goodfellas","One Flew Over the Cuckoo's Nest","Se7en","Seven Samurai","It's a Wonderful Life","The Silence of the Lambs","City of God","Saving Private Ryan","Life Is Beautiful","The Green Mile","Interstellar","Star Wars","Terminator 2: Judgment Day","Back to the Future","Spirited Away","Psycho","The Pianist","Léon: The Professional","Parasite","The Lion King","Gladiator","American History X","The Usual Suspects","The Departed","The Prestige","Casablanca","Top Gun: Maverick","Whiplash","The Intouchables","Modern Times","Hara-Kiri","Once Upon a Time in the West","Grave of the Fireflies","Rear Window","Alien","City Lights","Cinema Paradiso","Memento","Apocalypse Now","Indiana Jones and the Raiders of the Lost Ark","Django Unchained","WALL·E","The Lives of Others","Sunset Blvd.","Paths of Glory","The Shining","The Great Dictator","Witness for the Prosecution","Avengers: Infinity War","Aliens","American Beauty","Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb","The Dark Knight Rises","Spider-Man: Into the Spider-Verse","Joker","Oldboy","Braveheart","Amadeus","Toy Story","Coco","The Boat","Inglourious Basterds","Princess Mononoke","Avengers: Endgame","Once Upon a Time in America","Good Will Hunting","Requiem for a Dream","Toy Story 3","Your Name.","Singin' in the Rain","3 Idiots","Star Wars: Episode VI - Return of the Jedi","2001: A Space Odyssey","Eternal Sunshine of the Spotless Mind","Reservoir Dogs","High and Low","Capernaum","Citizen Kane","Lawrence of Arabia","The Hunt","North by Northwest","M","Vertigo","Amélie","A Clockwork Orange","Come and See","Full Metal Jacket","Double Indemnity","The Apartment","Scarface","Ikiru","The Sting","To Kill a Mockingbird","Taxi Driver","L.A. Confidential","Up","Heat","Hamilton","Metropolis","A Separation","Snatch","Die Hard","Incendies","Indiana Jones and the Last Crusade","Bicycle Thieves","Spider-Man: No Way Home","1917","Like Stars on Earth","Downfall","For a Few Dollars More","Batman Begins","Dangal","The Kid","Some Like It Hot","Everything Everywhere All at Once","The Father","All About Eve","Green Book","The Wolf of Wall Street","Judgment at Nuremberg","Unforgiven","Casino","Ran","Pan's Labyrinth","There Will Be Blood","The Sixth Sense","A Beautiful Mind","Monty Python and the Holy Grail","The Truman Show","Yojimbo","The Treasure of the Sierra Madre","Shutter Island","Rashomon","The Great Escape","Jurassic Park","Kill Bill: Vol. 1","No Country for Old Men","Finding Nemo","The Elephant Man","Raging Bull","Chinatown","Gone with the Wind","V for Vendetta","Inside Out","The Thing","Lock, Stock and Two Smoking Barrels","Dial M for Murder","The Secret in Their Eyes","The Bridge on the River Kwai","Howl's Moving Castle","Three Billboards Outside Ebbing, Missouri","Trainspotting","Gran Torino","Warrior","Fargo","My Neighbor Totoro","Prisoners","Million Dollar Baby","The Gold Rush","Blade Runner","Catch Me If You Can","On the Waterfront","The Third Man","Children of Heaven","Ben-Hur","12 Years a Slave","The General","Gone Girl","Wild Strawberries","Before Sunrise","The Deer Hunter","Harry Potter and the Deathly Hallows: Part 2","In the Name of the Father","The Grand Budapest Hotel","Mr. Smith Goes to Washington","The Wages of Fear","Barry Lyndon","Sherlock Jr.","Memories of Murder","Room","Hacksaw Ridge","Klaus","The Seventh Seal","Wild Tales","The Big Lebowski","How to Train Your Dragon","Mad Max: Fury Road","Mary and Max","Monsters, Inc.","Jaws","The Passion of Joan of Arc","Hotel Rwanda","Dead Poets Society","Tokyo Story","Pather Panchali","Platoon","Rocky","Ford v Ferrari","Stand by Me","The Terminator","Spotlight","Rush","Into the Wild","Network","The Wizard of Oz","Logan","Groundhog Day","Ratatouille","The Exorcist","Before Sunset","The Best Years of Our Lives","The Incredibles","To Be or Not to Be","Rebecca","The Grapes of Wrath","The Battle of Algiers","Hachi: A Dog's Tale","Cool Hand Luke","Amores perros","Pirates of the Caribbean: The Curse of the Black Pearl","The 400 Blows","La Haine","Persona","Dersu Uzala","Life of Brian","My Father and My Son","It Happened One Night","The Sound of Music","The Handmaiden","Aladdin","Gandhi","The Help","Jai Bhim","Beauty and the Beast"],[1994,1972,2008,1974,1957,1993,2003,1994,2001,1966,1994,1999,2010,2002,1980,1999,1990,1975,1995,1954,1946,1991,2002,1998,1997,1999,2014,1977,1991,1985,2001,1960,2002,1994,2019,1994,2000,1998,1995,2006,2006,1942,2022,2014,2011,1936,1962,1968,1988,1954,1979,1931,1988,2000,1979,1981,2012,2008,2006,1950,1957,1980,1940,1957,2018,1986,1999,1964,2012,2018,2019,2003,1995,1984,1995,2017,1981,2009,1997,2019,1984,1997,2000,2010,2016,1952,2009,1983,1968,2004,1992,1963,2018,1941,1962,2012,1959,1931,1958,2001,1971,1985,1987,1944,1960,1983,1952,1973,1962,1976,1997,2009,1995,2020,1927,2011,2000,1988,2010,1989,1948,2021,2019,2007,2004,1965,2005,2016,1921,1959,2022,2020,1950,2018,2013,1961,1992,1995,1985,2006,2007,1999,2001,1975,1998,1961,1948,2010,1950,1963,1993,2003,2007,2003,1980,1980,1974,1939,2005,2015,1982,1998,1954,2009,1957,2004,2017,1996,2008,2011,1996,1988,2013,2004,1925,1982,2002,1954,1949,1997,1959,2013,1926,2014,1957,1995,1978,2011,1993,2014,1939,1953,1975,1924,2003,2015,2016,2019,1957,2014,1998,2010,2015,2009,2001,1975,1928,2004,1989,1953,1955,1986,1976,2019,1986,1984,2015,2013,2007,1976,1939,2017,1993,2007,1973,2004,1946,2004,1942,1940,1940,1966,2009,1967,2000,2003,1959,1995,1966,1975,1979,2005,1934,1965,2016,1992,1982,2011,2021,1991],[9.2,9.2,9,9,8.9,8.9,8.9,8.9,8.8,8.8,8.8,8.8,8.7,8.7,8.7,8.7,8.7,8.6,8.6,8.6,8.6,8.6,8.6,8.6,8.6,8.6,8.6,8.6,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.5,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.4,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.3,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.2,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>title<\/th>\n      <th>year<\/th>\n      <th>rating<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[2,3]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>

---

## Clean up / enhance

May or may not be a lot of work depending on how messy the data are

- See if you like what you got:


```r
glimpse(imdb_top_250)
```

```
## Rows: 250
## Columns: 3
## $ title  &lt;chr&gt; "The Shawshank Redemption", "The Godfather", "Th…
## $ year   &lt;dbl&gt; 1994, 1972, 2008, 1974, 1957, 1993, 2003, 1994, …
## $ rating &lt;dbl&gt; 9.2, 9.2, 9.0, 9.0, 8.9, 8.9, 8.9, 8.9, 8.8, 8.8…
```

- Add a variable for rank

```r
imdb_top_250 &lt;- imdb_top_250 %&gt;%
  mutate(rank = 1:nrow(imdb_top_250)) %&gt;%
  relocate(rank)
```

---


```
## # A tibble: 250 × 4
##     rank title                                        year rating
##    &lt;int&gt; &lt;chr&gt;                                       &lt;dbl&gt;  &lt;dbl&gt;
##  1     1 The Shawshank Redemption                     1994    9.2
##  2     2 The Godfather                                1972    9.2
##  3     3 The Dark Knight                              2008    9  
##  4     4 The Godfather Part II                        1974    9  
##  5     5 12 Angry Men                                 1957    8.9
##  6     6 Schindler's List                             1993    8.9
##  7     7 The Lord of the Rings: The Return of the K…  2003    8.9
##  8     8 Pulp Fiction                                 1994    8.9
##  9     9 The Lord of the Rings: The Fellowship of t…  2001    8.8
## 10    10 The Good, the Bad and the Ugly               1966    8.8
## 11    11 Forrest Gump                                 1994    8.8
## 12    12 Fight Club                                   1999    8.8
## 13    13 Inception                                    2010    8.7
## 14    14 The Lord of the Rings: The Two Towers        2002    8.7
## 15    15 Star Wars: Episode V - The Empire Strikes …  1980    8.7
## 16    16 The Matrix                                   1999    8.7
## 17    17 Goodfellas                                   1990    8.7
## 18    18 One Flew Over the Cuckoo's Nest              1975    8.6
## 19    19 Se7en                                        1995    8.6
## 20    20 Seven Samurai                                1954    8.6
## # … with 230 more rows
```

---

class: middle

# What next?

---

.question[
Which years have the most movies on the list?
]

--


```r
imdb_top_250 %&gt;% 
  count(year, sort = TRUE)
```

```
## # A tibble: 86 × 2
##    year     n
##   &lt;dbl&gt; &lt;int&gt;
## 1  1995     8
## 2  2004     7
## 3  1957     6
## 4  2003     6
## 5  2009     6
## 6  2019     6
## # … with 80 more rows
```

---

.question[
Which 1995 movies made the list?
]

--


```r
imdb_top_250 %&gt;% 
  filter(year == 1995) %&gt;%
  print(n = 8)
```

```
## # A tibble: 8 × 4
##    rank title               year rating
##   &lt;int&gt; &lt;chr&gt;              &lt;dbl&gt;  &lt;dbl&gt;
## 1    19 Se7en               1995    8.6
## 2    39 The Usual Suspects  1995    8.5
## 3    73 Braveheart          1995    8.3
## 4    75 Toy Story           1995    8.3
## 5   113 Heat                1995    8.2
## 6   138 Casino              1995    8.2
## 7   186 Before Sunrise      1995    8.1
## 8   238 La Haine            1995    8
```

---

.question[
Visualize the average yearly rating for movies that made it on the top 250 list over time.
]

--

.panelset[
.panel[.panel-name[Plot]
&lt;img src="u2-d19-top-250-imdb_files/figure-html/unnamed-chunk-46-1.png" width="58%" style="display: block; margin: auto;" /&gt;
]
.panel[.panel-name[Code]


```r
imdb_top_250 %&gt;% 
  group_by(year) %&gt;%
  summarise(avg_score = mean(rating)) %&gt;%
  ggplot(aes(y = avg_score, x = year)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Year", y = "Average score")
```
]
]
    </textarea>
<style data-target="print-only">@media screen {.remark-slide-container{display:block;}.remark-slide-scaler{box-shadow:none;}}</style>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script>var slideshow = remark.create({
"ratio": "16:9",
"highlightLines": true,
"highlightStyle": "solarized-light",
"countIncrementalSlides": false
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {
  window.dispatchEvent(new Event('resize'));
});
(function(d) {
  var s = d.createElement("style"), r = d.querySelector(".remark-slide-scaler");
  if (!r) return;
  s.type = "text/css"; s.innerHTML = "@page {size: " + r.style.width + " " + r.style.height +"; }";
  d.head.appendChild(s);
})(document);

(function(d) {
  var el = d.getElementsByClassName("remark-slides-area");
  if (!el) return;
  var slide, slides = slideshow.getSlides(), els = el[0].children;
  for (var i = 1; i < slides.length; i++) {
    slide = slides[i];
    if (slide.properties.continued === "true" || slide.properties.count === "false") {
      els[i - 1].className += ' has-continuation';
    }
  }
  var s = d.createElement("style");
  s.type = "text/css"; s.innerHTML = "@media print { .has-continuation { display: none; } }";
  d.head.appendChild(s);
})(document);
// delete the temporary CSS (for displaying all slides initially) when the user
// starts to view slides
(function() {
  var deleted = false;
  slideshow.on('beforeShowSlide', function(slide) {
    if (deleted) return;
    var sheets = document.styleSheets, node;
    for (var i = 0; i < sheets.length; i++) {
      node = sheets[i].ownerNode;
      if (node.dataset["target"] !== "print-only") continue;
      node.parentNode.removeChild(node);
    }
    deleted = true;
  });
})();
// add `data-at-shortcutkeys` attribute to <body> to resolve conflicts with JAWS
// screen reader (see PR #262)
(function(d) {
  let res = {};
  d.querySelectorAll('.remark-help-content table tr').forEach(tr => {
    const t = tr.querySelector('td:nth-child(2)').innerText;
    tr.querySelectorAll('td:first-child .key').forEach(key => {
      const k = key.innerText;
      if (/^[a-z]$/.test(k)) res[k] = t;  // must be a single letter (key)
    });
  });
  d.body.setAttribute('data-at-shortcutkeys', JSON.stringify(res));
})(document);
(function() {
  "use strict"
  // Replace <script> tags in slides area to make them executable
  var scripts = document.querySelectorAll(
    '.remark-slides-area .remark-slide-container script'
  );
  if (!scripts.length) return;
  for (var i = 0; i < scripts.length; i++) {
    var s = document.createElement('script');
    var code = document.createTextNode(scripts[i].textContent);
    s.appendChild(code);
    var scriptAttrs = scripts[i].attributes;
    for (var j = 0; j < scriptAttrs.length; j++) {
      s.setAttribute(scriptAttrs[j].name, scriptAttrs[j].value);
    }
    scripts[i].parentElement.replaceChild(s, scripts[i]);
  }
})();
(function() {
  var links = document.getElementsByTagName('a');
  for (var i = 0; i < links.length; i++) {
    if (/^(https?:)?\/\//.test(links[i].getAttribute('href'))) {
      links[i].target = '_blank';
    }
  }
})();
// adds .remark-code-has-line-highlighted class to <pre> parent elements
// of code chunks containing highlighted lines with class .remark-code-line-highlighted
(function(d) {
  const hlines = d.querySelectorAll('.remark-code-line-highlighted');
  const preParents = [];
  const findPreParent = function(line, p = 0) {
    if (p > 1) return null; // traverse up no further than grandparent
    const el = line.parentElement;
    return el.tagName === "PRE" ? el : findPreParent(el, ++p);
  };

  for (let line of hlines) {
    let pre = findPreParent(line);
    if (pre && !preParents.includes(pre)) preParents.push(pre);
  }
  preParents.forEach(p => p.classList.add("remark-code-has-line-highlighted"));
})(document);</script>

<script>
slideshow._releaseMath = function(el) {
  var i, text, code, codes = el.getElementsByTagName('code');
  for (i = 0; i < codes.length;) {
    code = codes[i];
    if (code.parentNode.tagName !== 'PRE' && code.childElementCount === 0) {
      text = code.textContent;
      if (/^\\\((.|\s)+\\\)$/.test(text) || /^\\\[(.|\s)+\\\]$/.test(text) ||
          /^\$\$(.|\s)+\$\$$/.test(text) ||
          /^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
        code.outerHTML = code.innerHTML;  // remove <code></code>
        continue;
      }
    }
    i++;
  }
};
slideshow._releaseMath(document);
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
  var script = document.createElement('script');
  script.type = 'text/javascript';
  script.src  = 'https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML';
  if (location.protocol !== 'file:' && /^https?:/.test(script.src))
    script.src  = script.src.replace(/^https?:/, '');
  document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
  </body>
</html>
